eigenvalue decay
Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks
We consider the problem of learning function classes computed by neural networks with various activations (e.g. ReLU or Sigmoid), a task believed to be computationally intractable in the worst-case. A major open problem is to understand the minimal assumptions under which these classes admit provably efficient algorithms. In this work we show that a natural distributional assumption corresponding to {\em eigenvalue decay} of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g.
Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks
We consider the problem of learning function classes computed by neural networks with various activations (e.g. ReLU or Sigmoid), a task believed to be computationally intractable in the worst-case. A major open problem is to understand the minimal assumptions under which these classes admit provably efficient algorithms. In this work we show that a natural distributional assumption corresponding to eigenvalue decay of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g.
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- (4 more...)
d0f5edad9ac19abed9e235c0fe0aa59f-AuthorFeedback.pdf
We thank the reviewer for providing constructive feedback and suggestions. By now, the number of papers (and books!) using these two parametrizations is In this view, the specific regime in which our rates are better is not really important. Our final comments on the bias of the community towards "weak assumptions" was So, we are happy that the reviewer engaged with us in this discussion! "strong" is the case of zero Bayes error w.r.t. the square loss: It is completely a problem-dependent judgment rather Instead, we just consider it an interesting setting that researchers have ignored for a long time. Moreover, we do plan to extend the results we presented to smooth classification losses, as the squared hinge loss.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (2 more...)
72e6d3238361fe70f22fb0ac624a7072-AuthorFeedback.pdf
We thank all reviewers for their helpful feedback. Below we address the questions and comments individually. We will correct typos in the main text and bibliography, and refer to Figure 1 in the introduction. We apologize for the confusion. The V AMP framework does not capture our "aligned" or "misalgined" cases.
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)